DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters

نویسندگان

You-Luen Lee

Da-Cheng Juan

Xuan-An Tseng

Yu-Ting Chen

Shih-Chieh Chang

چکیده

When will a server fail catastrophically in an industrial datacenter? Is it possible to forecast these failures so preventive actions can be taken to increase the reliability of a datacenter? To answer these questions, we have studied what are probably the largest, publicly available datacenter traces, containing more than 104 million events from 12,500 machines. Among these samples, we observe and categorize three types of machine failures, all of which are catastrophic and may lead to information loss, or even worse, reliability degradation of a datacenter. We further propose a two-stage framework—DC-Prophet—based on One-Class Support Vector Machine and Random Forest. DC-Prophet extracts surprising patterns and accurately predicts the next failure of a machine. Experimental results show that DC-Prophet achieves an AUC of 0.93 in predicting the next machine failure, and a F3-score of 0.88 (out of 1). On average, DC-Prophet outperforms other classical machine learning methods by 39.45% in F3-score.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Communication-Aware Traffic Stream Optimization for Virtual Machine Placement in Cloud Datacenters with VL2 Topology

By pervasiveness of cloud computing, a colossal amount of applications from gigantic organizations increasingly tend to rely on cloud services. These demands caused a great number of applications in form of couple of virtual machines (VMs) requests to be executed on data centers’ servers. Some of applications are as big as not possible to be processed upon a single VM. Also, there exists severa...

متن کامل

Transfer Learning-Based Co-Run Scheduling for Heterogeneous Datacenters

Today’s data centers are designed with multi-core CPUs where multiple virtual machines (VMs) can be colocated into one physical machine or distribute multiple computing tasks onto one physical machine. The result is co-tenancy, resource sharing and competition. Modeling and predicting such co-run interference becomes crucial for job scheduling and Quality of Service assurance. Co-locating inter...

متن کامل

VM Consolidation by using Selection and Placement of VMs in Cloud Datacenters

The Cloud Computing model leverages virtualization of computing resources allowing customers to provision resources on-demand on a pay-as-you-go basis. During recent years, the power consumption of datacenters in cloud environment attracted researchers. Optimization of energy consumption can be performed by different methods including virtual machine (VM) consolidation. This technique can reduc...

متن کامل

Communication-efficient Outlier Detection for Scale-out Systems

Modern scale-out services are built on top of large datacenters composed of thousands of individual machines. These must be continuously monitored because unexpected failures can overload fail-over mechanism and cause large-scale outages. Such monitoring can be accomplished by periodically measuring hundreds of performance metrics and looking for outliers, often caused by misconfigurations, har...

متن کامل

High-Availability at Massive Scale: Building Google’s Data Infrastructure for Ads

Google’s Ads Data Infrastructure systems run the multibillion dollar ads business at Google. High availability and strong consistency are critical for these systems. While most distributed systems handle machine-level failures well, handling datacenter-level failures is less common. In our experience, handling datacenter-level failures is critical for running true high availability systems. Mos...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters

نویسندگان

چکیده

منابع مشابه

Communication-Aware Traffic Stream Optimization for Virtual Machine Placement in Cloud Datacenters with VL2 Topology

Transfer Learning-Based Co-Run Scheduling for Heterogeneous Datacenters

VM Consolidation by using Selection and Placement of VMs in Cloud Datacenters

Communication-efficient Outlier Detection for Scale-out Systems

High-Availability at Massive Scale: Building Google’s Data Infrastructure for Ads

عنوان ژورنال:

اشتراک گذاری